Rapidly deploying virtual database applications using data model analysis

ABSTRACT

Techniques are described for creating a first data abstraction model for a first database. Embodiments analyze the first database to determine a first set of structural characteristics, and analyze a second database to determine a second set of structural characteristics. The analyzed second database is associated with a second data abstraction model. The first set of structural characteristics is compared with the second set of structural characteristics to identify one or more similarities between the two sets of structural characteristics. Embodiments then create the first data abstraction model for the first database, based on the identified similarities and the second data abstraction model.

BACKGROUND

The present invention generally relates to data processing and, moreparticularly, to normalizing data as part of a database restore.

Databases are computerized information storage and retrieval systems. Arelational database management system is a computer database managementsystem (DBMS) that uses relational techniques for storing and retrievingdata. An object-oriented programming database is a database that iscongruent with the data defined in object classes and subclasses.Regardless of the particular architecture, a requesting entity (e.g., anapplication or the operating system) in a DBMS requests access to aspecified database by issuing a database access request. Such requestsmay include, for instance, simple catalog lookup requests ortransactions and combinations of transactions that operate to read,change and add specified records in the database. These requests (i.e.,queries) are often made using high-level query languages such as theStructured Query Language (SQL). Upon receiving such a request, the DBMSmay execute the request against a corresponding database, and return anyresult of the execution to the requesting entity.

Data abstraction techniques may be used in conjunction with a databasein order to improve the usability of the database. Generally, suchtechniques provide for an abstraction layer between the database and theusers of the database, which enables queries to be issued against thedatabase without referring to the physical structure of the underlyingdatabase. This may, in turn, enable queries to be issued against adatabase using more user-friendly terms. However, creating such anabstraction model for a database is often a very time consuming andcostly task, which may deter potential businesses from adopting suchdata abstraction techniques.

SUMMARY

A method, computer program product and system for creating a first dataabstraction model for a first database. The method, computer programproduct and system include analyzing the first database to determine afirst set of structural characteristics of the first database. Themethod, computer program product and system also include analyzing asecond database to determine a second set of structural characteristicsof the second database, wherein the second database is associated with asecond data abstraction model. The method, computer program product andsystem further include comparing the first set of structuralcharacteristics with the second set of structural characteristics toidentify one or more similarities there between. Additionally, themethod, computer program product and system include creating the firstdata abstraction model for the first database, based on the identifiedsimilarities and the second data abstraction model.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited aspects are attained andcan be understood in detail, a more particular description ofembodiments of the invention, briefly summarized above, may be had byreference to the appended drawings.

It is to be noted, however, that the appended drawings illustrate onlytypical embodiments of this invention and are therefore not to beconsidered limiting of its scope, for the invention may admit to otherequally effective embodiments.

FIGS. 1A-1B are block diagrams illustrating computer systems utilizedaccording to embodiments of the present invention.

FIGS. 2-3 are relational views of software components for abstract querymanagement, according to embodiments of the present invention.

FIGS. 4-5 are flow charts illustrating the operation of a runtimecomponent, according to embodiments of the present invention.

FIG. 6 is a flow diagram illustrating a method for creating anabstraction model, according to one embodiment of the present invention.

FIG. 7 is a flow diagram illustrating a method for analyzing a database,according to one embodiment of the present invention.

FIG. 8 is a flow diagram illustrating a method for creating anabstraction model, according to one embodiment of the present invention.

DETAILED DESCRIPTION

Data abstraction models serve to improve the usability of databases by,for instance, allowing users to enter queries using more user-friendlyterminology. As an example, an underlying database may store a hospitalpatient's first name in table “contact” and column “f_name”. However, itmay be difficult for technologically unsophisticated users to constructqueries against the database using the combination of this table andcolumn. In contrast, a data abstraction model may be created with alogical field having a more user-friendly name (e.g., “FirstName”).Users may then specify the name of this logical field in abstractqueries. Doing this enables less sophisticated users to more easilyconstruct queries for the database. However, developing such a dataabstraction model for the database is often costly in terms of time andresources.

Often times, there are similarities between the structure of multipledatabases, even though the databases may be managed by separate entitiesand contain different data. For example, two hospitals may each maintainseparate databases for storing their respective test result data.Although in this example the hospitals are separate and distinct fromone another, and even though these databases may contain entirelydifferent data, their databases may store the test result data using thesame or a similar structure. For instance, both hospitals may adhere tothe same industry standard data model for storing test results (e.g.,ICD-9, DRG, etc.). In other words, both hospitals may use a table(s) orsubset of a table in order to store the test data, but may do so usingmultiple, distinct database schemas. For instance, two databases maycontain test result data using codes which conform to the LogicalObservation Identifiers Names and Codes (“LOINC”) standard, but each maystore this data using a different database schema. Accordingly, a dataabstraction model created for one of the databases may be the same as orsimilar to a data abstraction model for the other database.

Embodiments of the present invention generally provide techniques forcreating a data abstraction model for a first database. Embodiments mayanalyze the first database to determine a first set of structuralcharacteristics for the database. For example, such characteristics mayinclude what tables the database contains, the structure of the tables,data contained in the tables, and so on. Embodiments may additionallyanalyze a second database, for which a data abstraction model hasalready been created, to determine a second set of structuralcharacteristics for the second database. The first set of structuralcharacteristics may then be compared with the second set of structuralcharacteristics to identify similarities between the two databases.Embodiments may then create a data abstraction for the first database,based on the identified similarities and the second data abstractionmodel for the second database. Advantageously, doing so minimizes theamount of time required to create a data abstraction model for the firstdatabase by leveraging existing data abstraction models created forsimilar databases.

Moreover, it is explicitly contemplated that embodiments of theinvention may be provided to end users through a cloud computinginfrastructure. Cloud computing generally refers to the provision ofscalable computing resources as a service over a network. More formally,cloud computing may be defined as a computing capability that providesan abstraction between the computing resource and its underlyingtechnical architecture (e.g., servers, storage, networks), enablingconvenient, on-demand network access to a shared pool of configurablecomputing resources that can be rapidly provisioned and released withminimal management effort or service provider interaction. Thus, cloudcomputing allows a user to access virtual computing resources (e.g.,storage, data, applications, and even complete virtualized computingsystems) in “the cloud,” without regard for the underlying physicalsystems (or locations of those systems) used to provide the computingresources.

Typically, cloud computing resources are provided to a user on apay-per-use basis, where users are charged only for the computingresources actually used (e.g., an amount of storage space consumed by auser or a number of virtualized systems instantiated by the user). Auser can access any of the resources that reside in the cloud at anytime, and from anywhere across the Internet. In context of the presentinvention, a user may access applications (e.g., a DBMS) or related dataavailable in the cloud. For example, the DBMS (configured with adatabase analysis component) could execute on a computing system in thecloud and process queries to access the first database received fromusers and applications in the cloud. In such a case, the databaseanalysis component could determine a first set of structuralcharacteristics for the first database. Furthermore, the databaseanalysis component may analyze other databases and their correspondingdata abstraction models in the cloud to identify similarities with thefirst database. In one embodiment, such analysis of other databases inthe cloud may be performed anonymously, so that any confidential datastored in those databases is not included in the analysis. The databaseanalysis component may then create a data abstraction model for thefirst database, based on the first set of structural characteristics andthe structural characteristics and data abstraction models of the otherdatabases in the cloud. Doing so allows a user to efficiently create adata abstraction model for the first database from any computing systemattached to a network connected to the cloud (e.g., the Internet).

In the following, reference is made to embodiments of the invention.However, it should be understood that the invention is not limited tospecific described embodiments. Instead, any combination of thefollowing features and elements, whether related to differentembodiments or not, is contemplated to implement and practice theinvention. Furthermore, although embodiments of the invention mayachieve advantages over other possible solutions and/or over the priorart, whether or not a particular advantage is achieved by a givenembodiment is not limiting of the invention. Thus, the followingaspects, features, embodiments and advantages are merely illustrativeand are not considered elements or limitations of the appended claimsexcept where explicitly recited in a claim(s). Likewise, reference to“the invention” shall not be construed as a generalization of anyinventive subject matter disclosed herein and shall not be considered tobe an element or limitation of the appended claims except whereexplicitly recited in a claim(s).

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

Referring now to FIG. 1A, FIG. 1A is a block diagram illustrating acloud computing environment configured to run a database analysiscomponent, according to one embodiment of the present invention. Asshown, the cloud computing environment 100 contains cloud nodes 105 anda management system 155. Generally, the management system 155 isconfigured to direct the operations of the cloud nodes 105. Forinstance, the management system 155 may control which cloud node 105 newworkloads are instantiated on. The cloud nodes 105 may generally be anydevices which contribute resources (e.g., processing, memory, storage,etc.) to the cloud computing environment 100. Additionally, althoughcloud nodes 105 ₁ and 105 _(N) are shown, such a depiction is withoutlimitation and for illustrative purposes only. Moreover, one of ordinaryskill in the art will quickly recognize that other cloud computingenvironments may contain any number of nodes.

As shown, cloud node 105 ₁ includes, without limitation, a processor 110₁, system storage 115 ₁, a memory 125 ₁, and a network interface card145 ₁. The processor 110 ₁ generally retrieves and executes programminginstructions stored in the memory 125 ₁. Similarly, the processor 110 ₁stores and retrieves application data residing in the memory 125 ₁. Aninterconnect (not shown) may be used to transmit programminginstructions and application data between the processor 110 ₁, storage115 ₁, network interface 145 ₁, and memory 125 ₁. Processor 110 ₁ isincluded to be representative of a single CPU, multiple CPUs, a singleCPU having multiple processing cores, a GPU and the like. Moregenerally, processor 122 may be any processor capable of performing thefunctions described herein. Although memory 125 ₁ is shown as a singleentity, memory 125 ₁ may include one or more memory devices havingblocks of memory associated with physical addresses, such as randomaccess memory (RAM), read only memory (ROM), flash memory or other typesof volatile and/or non-volatile memory. Storage 115 ₁, such as a harddisk drive, solid state device (SSD), or flash memory storage drive, maystore non-volatile data. As shown, storage 115 ₁ contains analysis data120. Generally, the analysis data 120 represents any data relating todatabase structure analysis. For instance, the analysis data 120 maycontain data relating to the structure of a physical database or a dataabstraction model, as well as standards information (e.g., particularcodes defined according to an industry standard). The cloud node 105 ₁may connect to a network 150 (e.g., the Internet) using the networkinterface 145. Furthermore, as will be understood by one of ordinaryskill in the art, any computer system capable of performing thefunctions described herein may be used.

In the pictured embodiment, memory 125 ₁ contains a database 130 ₁, adatabase analysis component 135 and an operating system 140 ₁. Thedatabase 130 ₁ may be managed by a database management system (notshown). Likewise, the database analysis component 135 is integrated intothe database management system (hereinafter “DBMS”). Generally, theoperating system 140 ₁ may be any operating system capable of performingthe functions described herein. Furthermore, although various elementsare shown as residing in memory 125 ₁ on the cloud node 105 ₁, such adepiction is without limitation. Of course, one of ordinary skill in theart will recognize that elements such as, for instance, the database 130₁, may reside in memory 125 ₁ (as shown), in storage 115 ₁, acombination thereof, or even on another computer system entirely, andthat the depiction shown in FIG. 1A is for illustrative purposes only.

Generally, the database analysis component 135 may analyze the database130 ₁ to determine a first set of structural characteristics for thedatabase 130 ₁. These characteristics may include information related tothe structure of the database, such as the tables contained in thedatabase and the structure of those tables. The characteristics mayfurther include information on the data contained in the tables. Forinstance, the database analysis component 135 may analyze the database130 ₁ and determine that one column of data conforms to a particularindustry standard. The database analysis component 135 may also examinerelationships between the tables in the database. One example of such arelationship would be if a first table of the database 130 ₁ containsreferences to a second table of the database 130 ₁ (e.g., a foreignkey).

The database analysis component 135 may further analyze the database 130_(N) residing on cloud node 105 _(N) to identify a second set ofstructural characteristics. Similar to the analysis for the database 130₁, this analysis may examine the structure of tables within thedatabase, as well as data contained in the tables. The database analysiscomponent 135 may then compare the first set of structuralcharacteristics with the second set of structural characteristics toidentify similarities between the database 130 ₁ and the database 130_(N).

In accordance with embodiments of the present invention, a dataabstraction model 160 may be provided for the database 130 _(N).Embodiments that use a data abstraction model allow for database queriesto be written in the form of abstract queries composed using one or morelogical fields. Returning to the present example, the database analysiscomponent 135 may create a data abstraction model for the database 130₁, based on the identified similarities between the database 130 ₁ andthe database 130 _(N), and the data abstraction model 160 provided forthe database 130 _(N). For instance, the database analysis component 135may determine that a first table and a second table from the database130 ₁ and the database 130 _(N), respectively, are related, since thetables are structured in the exact same way (although the tables maycontain different data). Upon determining these two tables are related,the database analysis component 135 may create portions of the dataabstraction model 160 for the database 130 ₁, based on portions of thedata abstraction model 160 corresponding to the related tables.

As an example, assume that the first table and the second table bothcontain contact information for hospital patients, that the first tablein database 130 ₁ contains a column named “fname” storing the first nameof each patient, and that the second table in database 130 _(N) containsa column named “f_name” for storing the first name of each patient.Furthermore, assume that the data abstraction model 160 contains alogical field named “FirstName” which maps to the “f_name” column in thetable of database 130 _(N). Upon determining that the first table andthe second table are related, the database analysis component 135 mayfurther analyze the data abstraction model 160 and, based on thedetermination that the logical field “FirstName” maps to the column of“f_name”, the database analysis component 135 could create a logicalfield named “FirstName” in the data abstraction model for the database130 ₁ which maps to the column named “fname”. Advantageously, doing soenables the data abstraction model for the database 130 ₁ to be quicklyand efficiently created, thus saving on the costs in terms of time andresources used to create the data abstraction model.

An Exemplary Query Execution Runtime Environment

Referring now to FIG. 1B, a computing environment 100 is shown. Ingeneral, the environment includes computer system 175 and a plurality ofnetworked devices 176. The computer system 175 may represent any type ofcomputer, computer system or other programmable electronic device,including a client computer, a server computer, a portable computer, anembedded controller, a PC-based server, a minicomputer, a midrangecomputer, a mainframe computer, and other computers adapted to supportthe methods, apparatus, and article of manufacture of the invention.Furthermore, as discussed above, in one embodiment, the computer system175 refers to a cloud node in a cloud computing environment. In oneembodiment, the computer system 175 is an eServer computer availablefrom International Business Machines of Armonk, N.Y.

Illustratively, the computer system 175 comprises a networked system.However, the computer system 175 may also comprise a standalone device.In any case, it is understood that FIG. 1B is merely one configurationfor a computer system. Embodiments of the invention can apply to anycomparable configuration, regardless of whether the computer system 175is a complicated multi-user apparatus, a single-user workstation, or anetwork appliance that does not have non-volatile storage of its own.

The embodiments of the present invention may also be practiced indistributed computing environments in which tasks are performed byremote processing devices that are linked through a communicationsnetwork. In a distributed computing environment, program modules may belocated in both local and remote memory storage devices. In this regard,the computer system 175 and/or one or more of the networked devices 176may be thin clients which perform little or no processing.

As shown, the computer system 175 includes a number of operators andperipheral systems. For instance, the system 175 includes a mass storageinterface 167 operably connected to a direct access storage device 124,a video interface 170 operably connected to a display 172, and a networkinterface 138 operably connected to the plurality of networked devices176. The display 172 may be any video output device for outputtingviewable information.

Computer system 175 is shown comprising at least one processor 110,which obtains instructions and data via a bus 144 from a main memory125. The processor 110 could be any processor adapted to support themethods of the invention. The main memory 125 is any memory sufficientlylarge to hold the necessary programs and data structures. Main memory125 could be one or a combination of memory devices, including RandomAccess Memory, nonvolatile or backup memory, (e.g., programmable orFlash memories, read-only memories, etc.). In addition, memory 125 maybe considered to include memory physically located elsewhere in thecomputer system 175, for example, any storage capacity used as virtualmemory or stored on a mass storage device (e.g., direct access storagedevice 124) or on another computer coupled to the computer system 175via bus 144.

The memory 125 is shown configured with an operating system 140. Theoperating system 140 is the software used for managing the operation ofthe computer system 175. Examples of the operating system 140 includeIBM OS/400®, UNIX, Microsoft Windows®, and the like.

The memory 125 further includes one or more applications 151 and anabstract model interface 161. The applications 151 and the abstractmodel interface 161 are software products comprising a plurality ofinstructions that are resident at various times in various memory andstorage devices in the computer system 175. When read and executed byone or more processors 110 in the computer system 175, the applications151 and the abstract model interface 161 cause the computer system 175to perform the steps necessary to execute steps or elements embodyingthe various aspects of the invention. The applications 151 (and moregenerally, any requesting entity, including the operating system 140)are configured to issue queries against a database 130 (shown in storage124). The database 130 is representative of any collection of dataregardless of the particular physical representation of the data. Aphysical representation of data defines an organizational schema of thedata. By way of illustration, the database 130 may be organizedaccording to a relational schema (accessible by SQL queries) oraccording to an XML schema (accessible by XML queries). However, theinvention is not limited to a particular schema and contemplatesextension to schemas presently unknown. As used herein, the term“schema” generically refers to a particular arrangement of data.

The queries issued by the applications 151 are defined according to anapplication query specification 152 included with each application 151.The queries issued by the applications 151 may be predefined (i.e., hardcoded as part of the applications 151) or may be generated in responseto input (e.g., user input). In either case, the queries (referred toherein as “abstract queries”) are composed using logical fields definedby the abstract model interface 161. A logical field defines an abstractview of data whether as an individual data item or a data structure inthe form of, for example, a database table. In particular, the logicalfields used in the abstract queries are defined by a data abstractionmodel component 160 of the abstract model interface 161. The runtimecomponent 164 transforms the abstract queries into concrete querieshaving a form consistent with the physical representation of the datacontained in the database 130. The concrete queries can be executed bythe runtime component 164 against the database 130.

Referring now to FIG. 2, a relational view illustrating interaction ofthe runtime component 164, the application 150, and the data abstractionmodel 160 at query execution runtime is shown. The data abstractionmodel 160 is also referred to herein as a “logical representation”because the data abstraction model 160 defines logical fieldscorresponding to data structures in a database (e.g., database 130),thereby providing an abstract (i.e., a logical) view of the data in thedatabase. A data structure is a physical arrangement of the data, suchas an arrangement in the form of a database table or a column of thedatabase table. In a relational database environment having amultiplicity of database tables, a specific logical representationhaving specific logical fields can be provided for each database table.In this case, all specific logical representations together constitutethe data abstraction model 160. Physical entities of the data arearranged in the database 130 according to a physical representation ofthe data. A physical entity of data (interchangeably referred to as aphysical data entity) is a data item in an underlying physicalrepresentation. Accordingly, a physical data entity is the data includedin a database table or in a column of the database table, i.e., the dataitself. By way of illustration, two physical representations are shown,an XML data representation 214 ₁ and a relational data representation214 ₂. However, the physical representation 214 _(N) indicates that anyother physical representation, known or unknown, is contemplated. In oneembodiment, a different single data abstraction model 160 is providedfor each separate physical representation 214, as explained above forthe case of a relational database environment. In an alternativeembodiment, a single data abstraction model 160 contains fieldspecifications (with associated access methods) for two or more physicalrepresentations 214. A field specification is a description of a logicalfield and generally comprises a mapping rule that maps the logical fieldto a data structure(s) of a particular physical representation.

Using a logical representation of the data, the application queryspecification 152 specifies one or more logical fields to compose aresulting query. A requesting entity (e.g., the application 151) issuesthe resulting query 202 as defined by an application query specificationof the requesting entity. In one embodiment, the abstract query 202 mayinclude both criteria used for data selection and an explicitspecification of result fields to be returned based on the dataselection criteria. An example of the selection criteria and the resultfield specification of the abstract query 202 are shown in FIG. 3.Accordingly, the abstract query 202 illustratively includes selectioncriteria 304 and a result field specification 306.

The resulting query 202 is generally referred to herein as an “abstractquery” because the query is composed according to abstract (i.e.,logical) fields rather than by direct reference to the underlying datastructures in the database 130. As a result, abstract queries may bedefined that are independent of the particular underlying physical datarepresentation used. For execution, the abstract query is transformedinto a concrete query consistent with the underlying physicalrepresentation of the data using the data abstraction model 160. Theconcrete query is executable against the database 130. An exemplarymethod for transforming the abstract query into a concrete query isdescribed below with reference to FIGS. 4-5.

In general, the data abstraction model 160 exposes information as a setof logical fields that may be used within an abstract query to specifycriteria for data selection and specify the form of result data returnedfrom a query operation. The logical fields are defined independently ofthe underlying physical representation being used in the database 130,thereby allowing abstract queries to be formed that are loosely coupledto the underlying physical representation.

An Exemplary Data Abstraction Model

Referring now to FIG. 3, a relational view illustrating interaction ofthe abstract query 202 and the data abstraction model 160 is shown. Inone embodiment, the data abstraction model 160 comprises a plurality offield specifications 308 ₁, 308 ₂, 308 ₃, 308 ₄ and 308 ₅ (five shown byway of example), collectively referred to as the field specifications308. Specifically, a field specification is provided for each logicalfield available for composition of an abstract query. Each fieldspecification may contain one or more attributes. Illustratively, thefield specifications 308 include a logical field name attribute 320 ₁,320 ₂, 320 ₃, 320 ₄, 320 ₅ (collectively, field name 320) and anassociated access method attribute 322 ₁, 322 ₂, 322 ₃, 322 ₄, 322 ₅(collectively, access methods 322). Each attribute may have a value. Forexample, logical field name attribute 320 ₁ has the value “FirstName”and access method attribute 322 ₁ has the value “Simple.” Furthermore,each attribute may include one or more associated abstract properties.Each abstract property describes a characteristic of a data structureand has an associated value. As indicated above, a data structure refersto a part of the underlying physical representation that is defined byone or more physical entities of the data corresponding to the logicalfield. In particular, an abstract property may represent data locationmetadata abstractly describing a location of a physical data entitycorresponding to the data structure, like a name of a database table ora name of a column in a database table. Illustratively, the accessmethod attribute 322 ₁ includes data location metadata “Table” and“Column.” Furthermore, data location metadata “Table” has the value“contact” and data location metadata “Column” has the value “f_name.”Accordingly, assuming an underlying relational database schema in thepresent example, the values of data location metadata “Table” and“Column” point to a table “contact” having a column “f_name.”

In one embodiment, groups (i.e., two or more) of logical fields may bepart of categories. Accordingly, the data abstraction model 160 includesa plurality of category specifications 310 ₁ and 310 ₂ (two shown by wayof example), collectively referred to as the category specifications. Inone embodiment, a category specification is provided for each logicalgrouping of two or more logical fields. For example, logical fields 308₁₋₃ and 308 ₄₋₅ are part of the category specifications 310 ₁ and 310 ₂,respectively. A category specification is also referred to herein simplyas a “category”. The categories are distinguished according to acategory name, e.g., category names 330 ₁ and 330 ₂ (collectively,category name(s) 330). In the present illustration, the logical fields308 ₁₋₃ are part of the “Name and Address” category and logical fields308 ₄₋₅ are part of the “Birth and Age” category.

The access methods 322 generally associate the logical field names withdata in the database (e.g., database 130 of FIG. 1B). Any number ofaccess methods is contemplated depending upon the number of differenttypes of logical fields to be supported. In one embodiment, accessmethods for simple fields, filtered fields and composed fields areprovided. The field specifications 308 ₁, 308 ₂ and 308 ₅ exemplifysimple field access methods 322 ₁, 322 ₂, and 322 ₅, respectively.Simple fields are mapped directly to a particular data structure in theunderlying physical representation (e.g., a field mapped to a givendatabase table and column). By way of illustration, as described above,the simple field access method 322 ₁ maps the logical field name 320 ₁(“FirstName”) to a column named “f_name” in a table named “contact.” Thefield specification 308 ₃ exemplifies a filtered field access method 322₃. Filtered fields identify an associated data structure and providefilters used to define a particular subset of items within the physicalrepresentation. An example is provided in FIG. 3 in which the filteredfield access method 322 ₃ maps the logical field name 320 ₃(“AnyTownLastName”) to data in a column named “I_name” in a table named“contact” and defines a filter for individuals in the city of “Anytown.”Another example of a filtered field is a New York ZIP code field thatmaps to the physical representation of ZIP codes and restricts the dataonly to those ZIP codes defined for the state of New York. The fieldspecification 308 ₄ exemplifies a composed field access method 322 ₄.Composed access methods compute a logical field from one or more datastructures using an expression supplied as part of the access methoddefinition. In this way, information which does not exist in theunderlying physical data representation may be computed. In the exampleillustrated in FIG. 3 the composed field access method 322 ₄ maps thelogical field name 320 ₄ “AgeInDecades” to “AgeInYears/10.” Anotherexample is a sales tax field that is composed by multiplying a salesprice field by a sales tax rate.

It is contemplated that the formats for any given data type (e.g.,dates, decimal numbers, etc.) of the underlying data may vary.Accordingly, in one embodiment, the field specifications 308 include atype attribute which reflects the format of the underlying data.However, in another embodiment, the data format of the fieldspecifications 308 is different from the associated underlying physicaldata, in which case a conversion of the underlying physical data intothe format of the logical field is required.

By way of example, the field specifications 308 of the data abstractionmodel 160 shown in FIG. 3 are representative of logical fields mapped todata represented in the relational data representation 214 ₂ shown inFIG. 2. However, other instances of the data abstraction model 160 maplogical fields to other physical representations, such as XML.

An illustrative abstract query corresponding to the abstract query 202shown in FIG. 3 is shown in Table I below. By way of illustration, theillustrative abstract query is defined using XML. However, any otherlanguage may be used to advantage.

TABLE I ABSTRACT QUERY EXAMPLE 001 <?xml version=“1.0”?> 002 <!--Querystring representation: (AgeInYears > “55”--> 003 <QueryAbstraction> 004<Selection> 005 <Condition internalID=“4”> 006 <Conditionfield=“AgeInYears” operator=“GT” 007 value=“55” internalID=“1”/> 008</Selection> 009 <Results> 010 <Field name=“FirstName”/> 011 <Fieldname=“AnyTownLastName”/> 012 </Results> 013 </QueryAbstraction>

Illustratively, the abstract query shown in Table I includes a selectionspecification (lines 004-008) containing selection criteria and a resultspecification (lines 009-012). In one embodiment, a selection criterionconsists of a field name (for a logical field), a comparison operator(=, >, <, etc.) and a value expression (what the field is being comparedto). In one embodiment, result specification is a list of abstractfields that are to be returned as a result of query execution. A resultspecification in the abstract query may consist of a field name and sortcriteria.

An illustrative data abstraction model (“DAM”) corresponding to the dataabstraction model 160 shown in FIG. 3 is shown in Table II below. By wayof illustration, the illustrative data abstraction model is definedusing XML. However, any other language may be used to advantage.

TABLE II DATA ABSTRACTION MODEL EXAMPLE 001 <?xml version=″1.0″?> 002<DataAbstraction> 003 <Category name=″Name and Address″> 004 <Fieldqueryable=″Yes″ name=″FirstName″ displayable=″Yes″> 005  <AccessMethod>006  <Simple columnName=″f_name″ tableName=″contact″></Simple> 007</AccessMethod> 008 </Field> 009 <Field queryable=″Yes″ name=″LastName″displayable=″Yes″> 010 <AccessMethod> 011 <Simple columnName=″1_name″tableName=″contact″></Simple> 012 </AccessMethod> 013 </Field> 014<Field queryable=″Yes″ name=″AnyTownLastName″ displayable=″Yes″> 015<AccessMethod> 016 <Filter columnName=″1_name″ tableName=″contact″> 017</Filter=”contact.city=Anytown”> 018 </AccessMethod> 019 </Field> 020</Category> 021 <Category name=″Birth and Age″> 022 <Fieldqueryable=″Yes″ name=″AgeInDecades″ displayable=″Yes″> 023 <AccessMethod> 024  <Composed columnName=″age″ tableName=″contact″> 025 </Composed Expression=”columnName/10”> 026 </AccessMethod> 027 </Field>028 <Field queryable=″Yes″ name=″AgeInYears″displayable=″Yes″> 029<AccessMethod> 030 <Simple columnName=″age″tableName=″contact″></Simple> 031 </AccessMethod> 032 </Field> 033</Category> 034 </DataAbstraction>

By way of example, note that lines 004-008 correspond to the first fieldspecification 308 ₁ of the DAM 160 shown in FIG. 3 and lines 009-013correspond to the second field specification 308 ₂.

Transforming an Abstract Query into a Concrete Query

Referring now to FIG. 4, an illustrative runtime method 400 exemplifyingone embodiment of the operation of the runtime component 164 of FIG. 1Bis shown. The method 400 is entered at step 402 when the runtimecomponent 164 receives as input an abstract query (such as the abstractquery shown in Table I). At step 404, the runtime component 164 readsand parses the abstract query and locates individual selection criteriaand desired result fields. At step 406, the runtime component 164 entersa loop (comprising steps 406, 408, 410 and 412) for processing eachquery selection criteria statement present in the abstract query,thereby building a data selection portion of a concrete query. In oneembodiment, a selection criterion consists of a field name (for alogical field), a comparison operator (=, >, <, etc.) and a valueexpression (what is field is being compared to). At step 408, theruntime component 164 uses the field name from a selection criterion ofthe abstract query to look up the definition of the field in the dataabstraction model 160. As noted above, the field definition includes adefinition of the access method used to access the data structureassociated with the field. The runtime component 164 then builds (step410) a concrete query contribution for the logical field beingprocessed. As defined herein, a concrete query contribution is a portionof a concrete query that is used to perform data selection based on thecurrent logical field. A concrete query is a query represented inlanguages like SQL and XML Query and is consistent with the data of agiven physical data repository (e.g., a relational database or XMLrepository). Accordingly, the concrete query is used to locate andretrieve data from the physical data repository, represented by thedatabase 130 shown in FIG. 1B. The concrete query contribution generatedfor the current field is then added to a concrete query statement. Themethod 400 then returns to step 406 to begin processing for the nextfield of the abstract query. Accordingly, the process entered at step406 is iterated for each data selection field in the abstract query,thereby contributing additional content to the eventual query to beperformed.

After building the data selection portion of the concrete query, theruntime component 164 identifies the information to be returned as aresult of query execution. As described above, in one embodiment, theabstract query defines a list of result fields, i.e., a list of logicalfields that are to be returned as a result of query execution, referredto herein as a result specification. A result specification in theabstract query may consist of a field name and sort criteria.Accordingly, the method 400 enters a loop at step 414 (defined by steps414, 416, 418 and 420) to add result field definitions to the concretequery being generated. At step 416, the runtime component 164 looks up aresult field name (from the result specification of the abstract query)in the data abstraction model 160 and then retrieves a result fielddefinition from the data abstraction model 160 to identify the physicallocation of data to be returned for the current logical result field.The runtime component 164 then builds (at step 418) a concrete querycontribution (of the concrete query that identifies physical location ofdata to be returned) for the logical result field. At step 420, theconcrete query contribution is then added to the concrete querystatement. Once each of the result specifications in the abstract queryhas been processed, the concrete query is executed at step 422.

One embodiment of a method 500 for building a concrete querycontribution for a logical field according to steps 410 and 418 isdescribed with reference to FIG. 5. At step 502, the method 500 querieswhether the access method associated with the current logical field is asimple access method. If so, the concrete query contribution is built(step 504) based on physical data location information and processingthen continues according to method 400 described above. Otherwise,processing continues to step 506 to query whether the access methodassociated with the current logical field is a filtered access method.If so, the concrete query contribution is built (step 508) based onphysical data location information for a given data structure(s). Atstep 510, the concrete query contribution is extended with additionallogic (filter selection) used to subset data associated with the givendata structure(s). Processing then continues according to method 400described above.

If the access method is not a filtered access method, processingproceeds from step 506 to step 512 where the method 500 queries whetherthe access method is a composed access method. If the access method is acomposed access method, the physical data location for each sub-fieldreference in the composed field expression is located and retrieved atstep 514. At step 516, the physical field location information of thecomposed field expression is substituted for the logical fieldreferences of the composed field expression, whereby the concrete querycontribution is generated. Processing then continues according to method400 described above.

If the access method is not a composed access method, processingproceeds from step 512 to step 518. Step 518 is representative of anyother access method types contemplated as embodiments of the presentinvention. However, it should be understood that embodiments arecontemplated in which less than all the available access methods areimplemented. For example, in a particular embodiment only simple accessmethods are used.

In another embodiment, only simple access methods and filtered accessmethods are used.

Rapid Virtual Database Application Deployment

FIG. 6 is a flow diagram illustrating a method for creating anabstraction model, according to one embodiment of the present invention.As shown, the method 600 begins at step 605, wherein a customer requeststo run an application having one or more data sources in a cloudcomputing environment. The customer further specifies one or more datasources to deploy in the cloud (step 610). For instance, the customermay select one or more databases to be deployed into the cloud. Forpurposes of this example, assume that at least one of these selecteddatabases is to be used in conjunction with a logical data layer whereabstract queries may be issued against the database (as describedabove), but that no data abstraction model (e.g., data abstraction modelcomponent 162) has yet been created for the database.

Once the customer selects the data sources to deploy into the cloud, thedatabase analysis component 135 receives this information from thecustomer and determines whether a virtual database configuration alreadyexists for any of the specified data sources (step 615). If the databaseanalysis component 135 determines the configuration information alreadyexists, the database analysis component 135 loads the existingconfiguration (step 620) and the method 600 ends. This loading processmay include, for instance, creating a new data abstraction model for thespecified data sources, based on the existing configuration information.For instance, if the database analysis component 135 has previouslycreated a data abstraction model for the selected database, the databaseanalysis component 135 may load the previously created data abstractionmodel for the selected database. This may be the case when, forinstance, the customer is deploying the same database multiple timesinto the cloud (e.g., for redundancy or load balancing purposes).

If the database analysis component 135 determines that no knownconfiguration information exists, the method 600 enters a loop where thedatabase analysis component 135 determines the structure of thespecified data sources (step 625). At step 630, the database analysiscomponent 135 determines whether there are tables remaining in the datasources to interrogate, and if so, the database analysis component 135determines whether the structure of the next table to interrogatematches a known table structure (step 635). The known table structuresmay be provided based on previous databases and abstract data modelscreated and/or used by the customer. In one embodiment where thedatabase analysis component 135 is deployed in a cloud computingenvironment, the database analysis component 135 may analyze otherdatabases (which may be owned and/or operated by other customers) in thecloud in order to identify known table structures. In a particularembodiment, the database analysis component 135 may be configured toanalyze these other databases anonymously, so as not to intrude upon anyconfidential data contained in these databases. For instance, thedatabase analysis component 135 may be configured to analyze thestructure of the other database (e.g., determining that a particulartable contains two columns having VARCHAR and BOOLEAN values,respectively), without looking at the actual data values contained inthe other database (e.g., the VARCHAR and BOOLEAN data values containedin the table).

In yet another embodiment, the database analysis component 135 may beconfigured to analyze a particular group of other databases. Forinstance, the customer may manually identify one or more other databaseswhich the database analysis component 135 could analyze in creating thedata abstraction model for the selected database. Such a grouping may beidentified explicitly. In one embodiment, the grouping may be determinedbased on a particular class specified for the selected database. Ofcourse, the above examples are without limitation and are provided forillustrative purposes only. Moreover, one of ordinary skill in the artwill recognize that any number of data sources and data configurationsmay be analyzed in accordance with embodiments of the present invention.

If the database analysis component 135 determines the table structuredoes not match any known table structures, the database analysiscomponent 135 analyzes the structure of the table (step 640). Thedatabase analysis component 135 then creates a new virtual databaseconfiguration for the table, based on the structure analysis for thetable (step 645). Here, the database analysis component 135 may createthe portion of the data abstraction model corresponding to the table ofthe data source, but may set particular fields in the data abstractionmodel as unmatched. In one embodiment, this is done by setting a flagassociated with the fields in the data abstraction model. In otherembodiments, the database analysis component 135 may leave the fieldsblank or set the field to a default value (e.g., “unmatched”) toindicate that no match was found. Doing so enables the customer toidentify and manually enter values for these fields at some later pointin time. Once the new configuration is created, the method returns tostep 630, where the database analysis component 135 determines whetherthere are additional tables to interrogate.

If at step 635 the database analysis component 135 determines that thetable structure matches a known table structure, the database analysiscomponent 135 then further determines whether it is a complete match ora partial match. In the event the database analysis component 135determines the table completely matches a known structure, the databaseanalysis component 135 then loads the existing configuration informationfor the known structure (step 620) and the method 600 ends. Forinstance, although the selected database in its entirety may not matchany known configurations, the structure of particular tables within thedatabase may match known table structures. In such a case, the databaseanalysis component 135 may use the configuration for the known tablestructures in creating the abstraction model for the tables in theselected database.

For instance, if the table in the selected database conforms to anindustry standard for database tables, and if the database analysiscomponent 135 has previously created and/or processed previous dataabstraction models for other tables related to this industry standard,then the database analysis component 135 could generate a new dataabstraction model for the selected database using the structure andcorresponding mappings of the previous data abstraction models. As anexample, a previously-processed database could have a table “T_DATA” forstoring test data, which contains a column “code” for storing diagnosiscodes, and that the previously-processed database is associated with aprevious data abstraction model containing a logical field named“Diagnosis Code” that maps to table “T_DATA” and column “code”. For thepurposes of this example, further assume that the database selected bythe customer contains a table “TestData” for storing test data, and thistable contains a column “diagCode” for storing the diagnosis codes.

Here, the database analysis component 135 may analyze the selecteddatabase and determine that the table “TestData” and column “diagCode”from the selected database correspond to the table “T_DATA” and column“code” in the previously-processed database. Based on this, the databaseanalysis component 135 may then create a logical field named “DiagnosisCode” in the data abstraction model for the selected database, and mapthis field to the corresponding table and column in the selecteddatabase (i.e., the table “TestData” and column “diagCode”). Thedatabase analysis component 135 may also incorporate other informationinto the logical field, based on the corresponding logical fields in theprevious data model. For instance, the access method associated with thenew logical field may be created based on the access method for thecorresponding logical field in the previous data model. As an example,if the corresponding logical field specifies a filtered access methodand a filter expression, the new logical field may be defined to have afiltered access method with the same filter expression. Of note, though,the database analysis component 135 may still update values in filterexpression to match the physical fields in the selected database towhich the new logical field is mapped. For instance, if the filterexpression from the previous data abstraction model specified theexpression “T_DATA.code=ABC”, the database analysis component 135 mayupdate the filter expression to “TestData.diagCode=ABC”, so as toreflect the structure of the selected database.

If the database analysis component 135 instead determines the tablestructure only partially matches the known table structure, then thedatabase analysis component 135 creates a new virtual databaseconfiguration based, at least in part, on the existing configurationinformation (step 655). That is, while the entire structure of aparticular database table may not be known, particular columns withinthe table may match known configuration information. In such a case, thedatabase analysis component 135 may create the new virtual databaseconfiguration for the known portions of the table. If, after doing this,other portions of the table are still unknown, the database analysiscomponent 135 may then flag these portions as unmatched, so that thecustomer may later manually enter the information for these portions.The method 600 then returns to step 630, where the database analysiscomponent 135 determines whether there are more tables in the datasources to interrogate.

Once all the tables have been analyzed, the database analysis component135 then creates a data abstraction model for the data sources (step660). Once the data abstraction model is created, the method 600 ends.Advantageously, the method 600 enables data abstraction models to beefficiently created for new data sources. As manually creating a dataabstraction model for a database can be expensive in terms of both timeand resources, the savings gained from use of the method 600 may besubstantial. These cost savings may be particularly substantial in acloud computing environment, where the database analysis component 135may analyze other databases and data abstraction models deployed in thecloud to identify similarities to the selected database. As anadditional advantage, the method 600 may even perform this analysisanonymously, thus protecting any confidential information contained inthe other databases and abstraction models in the cloud.

FIG. 7 is a flow diagram illustrating a method for analyzing a database,according to one embodiment of the present invention. As shown, themethod 700 begins at step 705, where the database analysis component 135analyzes the structure of a table in a data source. The method 700 thencontinues where, for the first column of the table, the databaseanalysis component 135 determines whether the column matches existingvirtual database configuration information (step 710). As discussedabove, this determination may be made based upon, without limitation,the data type of the column, the data contained within the column,conformance of the data with any known standards, and so on.

If the database analysis component 135 determines that the column doesmatch, the database analysis component 135 copies a matching fielddefinition from the existing virtual database configuration informationinto the new configuration (step 715). For instance, upon determiningthe column matches a known column configuration, the database analysiscomponent 135 may copy a portion of a data abstraction model associatedwith the known column configuration into a new data abstraction modelfor the data source being analyzed. In addition, the database analysiscomponent 135 may update the copied portion of the data model to reflectthe physical database structure of the database being analyzed.

If the database analysis component 135 determines that the column doesnot match any known configurations, the database analysis component 135creates a new field definition for the column and identifies the newfield definition as “unmatched” (step 720). Although a flag is used inthe depicted example to designate the column as unmatched, such anexample is for illustrative purposes only, and it is explicitlycontemplated that other methods could be used to designate the columnsas unmatched. For example, in one embodiment, the database analysiscomponent 135 sets the new field definition to a default value toindicate that no match was found. In an alternate embodiment, thedatabase analysis component 135 creates the new field definition butdoes not set it to any value at all, so as to indicate that no match wasfound. Moreover, one of ordinary skill in the art will recognize thatany number of other methods could be used to designate the new fielddefinition as unmatched.

Once the database analysis component 135 creates the new fielddefinition, whether populated with the matching field definitioninformation or flagged as unmatched, the database analysis component 135determines whether there are more columns in the table to analyze (step725). If so, the method 700 returns to step 705, where the databaseanalysis component 135 analyzes the structure of the next column in thetable. If the database analysis component 135 determines that there areno more columns to analyze, the method 700 ends. Advantageously, themethod 700 enables a data abstraction model to be created for new datasources by identifying similar existing data models and populatingportions of the new data abstraction model with corresponding portionsfrom the similar data models. Doing so allows a logical datarepresentation to be quickly and easily created for new data sources,thus saving the substantial cost in terms of time and resources requiredto manually create such a logical data representation.

FIG. 8 is a flow diagram illustrating a method for creating anabstraction model, according to one embodiment of the present invention.As shown, the method 800 begins at step 805, where the database analysiscomponent 135 generates a relationship graph based on fields identifiedin the new data abstraction model configuration. That is, relationshipsin the cloud are analyzed to identify existing joins betweentables/fields. In addition, attribute relationships may be analyzed. Forexample, assume field A and field B are identified, and field A has beendefined as an attribute of field B 80% of the time. If the databaseanalysis component 135 determines that this percentage exceeds a definedthreshold amount, then the database analysis component 135 mayautomatically set field A as an attribute of field B for the new datasource.

The database analysis component 135 then copies existing dynamicconditions into the new configuration, based on the identified fields(step 810). The dynamic conditions are part of the overall configurationof a virtual database application, similar to the data abstractionmodel. Generally speaking, dynamic conditions allow a condition buildingUI for a database application to be customized. As an example, a dynamiccondition may be used to display an advanced multi-field form to theuser, allowing the user to create a single conditional statement whichinvolves multiple fields of the form. As a second example, a dynamiccondition could be used as part of an interface where users can select astate or country from a map instead of a dropdown list. In such anexample, the dynamic condition may be easier or more intuitive for thedeveloper to construct than a traditional conditional statement.

Additionally, these dynamic conditions may be related to fields or datatypes of fields in a data abstraction model. Continuing the examplegiven above of an interface where users can select a state from a map,the dynamic condition may be associated with a state field in a dataabstraction model. Similarly, a dynamic condition associated with acalendar could be associated with a date field of a data abstractionmodel. Accordingly, in addition to creating the new data abstractionmodel containing logical fields based on the relationship graph, thedatabase analysis component 135 may be further configured to populatethe new data abstraction model with dynamic conditions from an existingdata abstraction model.

Once the existing dynamic conditions are copied, the database analysiscomponent 135 rearranges the order of the dynamic conditions based onidentified cloud trends (step 815). Generally speaking, the dynamicconditions are arranged in a particular order, in which the dynamicconditions are processed. Accordingly, in addition to determining whichdynamic conditions should be included in the data abstraction modelbased on the relationship graph, the database analysis component 135further determines an ordering for the dynamic conditions based on therelationship graph specifying trends amongst other data abstractionmodels in the cloud.

The database analysis component 135 then determines whether the usershould be prompted to manually update any unmatched fields (step 820).Such a determination may be based on, for instance, whether anyunmatched fields have been identified (e.g., at step 720). If thedatabase analysis component 135 determines the user should be prompted,then the database analysis component 135 outputs the closest matchesfrom the known configurations for display to the user. The user may thenselect one of the displayed matches to be used for the field in the dataabstraction model. Alternatively, the user may manually enterinformation to be used for the field in the data abstraction model. Thismay be preferable, for instance, when the new data source does notconform to the structure and standards of the existing configurations,and thus none of the displayed matches is accurate for the new datasource. Once the user has manually updated the unmatched fields, oralternatively if the database analysis component 135 determines thatthere are no unmatched fields to update, the database analysis component135 creates the data abstraction model for the new data source (step830), and the method 800 ends.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

While the foregoing is directed to embodiments of the present invention,other and further embodiments of the invention may be devised withoutdeparting from the basic scope thereof, and the scope thereof isdetermined by the claims that follow.

What is claimed is:
 1. A computer program product for creating a firstdata abstraction model for a first database, comprising: acomputer-readable storage medium having computer readable program codeembodied therewith, the computer readable program code comprising:computer readable program code to analyze the first database todetermine a first set of structural characteristics for contents of thefirst database; computer readable program code to analyze a seconddatabase, distinct from the first database, to determine a second set ofstructural characteristics for contents of the second database, whereina second data abstraction model is available for the second databasethat defines a plurality of logical fields that each map to at least onedatabase field in the second database, and wherein each of the logicalfields specifies at least (i) a logical field name and (ii) an accessmethod selected from a plurality of distinct access methods, wherein theaccess method defines a technique for generating output based on the atleast one database field the logical field maps to; computer readableprogram code to compare the first set of structural characteristics withthe second set of structural characteristics to determine comparablestructural characteristics of the first and second databases, relevantto the second data abstraction model of the second database; andcomputer readable program code to create the first data abstractionmodel for the first database, based on the contents of the first andsecond databases, the determined comparable structural characteristicsof the first and second databases and the second data abstraction modelfor the second database.
 2. The computer program product of claim 1,wherein the computer readable program code to compare the first set ofstructural characteristics with the second set of structuralcharacteristics, further comprises: computer readable program code tocompare the first set of structural characteristics with a set ofstandardized data to determine whether at least a portion of the firstdatabase adheres to standards defined for the set of standardized data;computer readable program code to compare the second set of structuralcharacteristics with the set of standardized data to determine whetherat least a portion of the second database adheres to the standardsdefined for the set of standardized data; and computer readable programcode to, upon determining both the first set of structuralcharacteristics and the second set of structural characteristics adhereto the standards, identify a similarity between the portion of the firstdatabase and the portion of the second database.
 3. The computer programproduct of claim 1, wherein the first set of structural characteristicsincludes an indication of tables contained in the first database.
 4. Thecomputer program product of claim 3, wherein the first set of structuralcharacteristics, for at least one of the tables, further includes anindication of columns in the table, a data type of the columns in thetable, and a relationship between at least two of the tables.
 5. Thecomputer program product of claim 1, wherein the computer readableprogram code to analyze the second database is performed anonymously,wherein the data contained in the second database is not accessed duringthe analysis.
 6. The computer program product of claim 1, wherein thecomputer readable program code to create the first data abstractionmodel for the first database, further comprises: computer readableprogram code to, for each of the identified similarities, insert acorresponding portion of the second data abstraction model into thefirst data abstraction model; and computer readable program code toupdate the inserted portion to refer to corresponding physical databasefields in the first database.
 7. The computer program product of claim1, wherein both the first database and the second database are providedas services in a cloud computing environment.
 8. The computer programproduct of claim 1, wherein the first data abstraction model defines asecond plurality of logical fields that each map to at least onedatabase field in the first database, and wherein each of the logicalfields specifies at least (i) a logical field name and (ii) an accessmethod selected from a plurality of distinct access methods, wherein theaccess method defines a technique for generating output based on thecorresponding at least one database field in the second database thatthe logical field maps to.
 9. A system, comprising: a processor; and amemory containing a program that, when executed on the processor,performs an operation for creating a first data abstraction model for afirst database, comprising: analyzing the first database to determine afirst set of structural characteristics for contents of the firstdatabase; analyzing a second database, distinct from the first database,to determine a second set of structural characteristics for contents ofthe second database, wherein an abstraction model is available for thesecond database that defines a plurality of logical fields that each mapto at least one database field in the second database, and wherein eachof the logical fields specifies at least (i) a logical field name and(ii) an access method selected from a plurality of distinct accessmethods, wherein the access method defines a technique for generatingoutput based on at least one database field the logical field maps to;comparing the first set of structural characteristics with the secondset of structural characteristics to determine comparable structuralcharacteristics of the first and second databases, relevant to thesecond data abstraction model of the second database; and creating thefirst data abstraction model for the first database, based on thecontents of the first and second databases, the determined comparablestructural characteristics of the first and second databases and thesecond data abstraction model for the second database.
 10. The system ofclaim 9, wherein comparing the first set of structural characteristicswith the second set of structural characteristics, further comprises:comparing the first set of structural characteristics with a set ofstandardized data to determine whether at least a portion of the firstdatabase adheres to standards defined for the set of standardized data;comparing the second set of structural characteristics with the set ofstandardized data to determine whether at least a portion of the seconddatabase adheres to the standards defined for the set of standardizeddata; and upon determining both the first set of structuralcharacteristics and the second set of structural characteristics adhereto the standards, identifying a similarity between the portion of thefirst database and the portion of the second database.
 11. The system ofclaim 9, wherein the first set of structural characteristics includes anindication of tables contained in the first database.
 12. The system ofclaim 11, wherein the first set of structural characteristics, for atleast one of the tables, further includes an indication of columns inthe table, a data type of the columns in the table, and a relationshipbetween at least two of the tables.
 13. The system of claim 9, whereinanalyzing the second database is performed anonymously, wherein the datacontained in the second database is not accessed during the analysis.14. The system of claim 9, wherein creating the first data abstractionmodel for the first database, further comprises: for each of theidentified similarities, inserting a corresponding portion of the seconddata abstraction model into the first data abstraction model; andupdating the inserted portion to refer to corresponding physicaldatabase fields in the first database.
 15. The system of claim 9,wherein both the first database and the second database are provided asservices in a cloud computing environment.